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Because the emphasis of the LHC is on 5<x discoveries and the LHC environment induces high systematic errors, many 
of the common statistical procedures used in High Energy Physics are not adequate. I review the basic ingredients 
of LHC searches, the sources of systematics, and the performance of several methods. Finally, I indicate the methods 
that seem most promising for the LHC and areas that are in need of further study. 



1 Introduction 

The Large Hadron Collider (LHC) at CERN and the 
two multipurpose detectors, Atlas and CMS, have 
been built in order to discover the Higgs boson, if it 
exists, and explore the theoretical landscape beyond 
the Standard Model. 1 ' 2 The LHC will collide protons 
with unprecedented center-of-mass energy (^/s = 14 
TeV) and luminosity (10 34 cm _2 s _1 ); the Atlas and 
CMS detectors will record these interactions with 
~ 10 8 individual electronic readouts per event. Be- 
cause the emphasis of the physics program is on dis- 
covery and the experimental environment is so com- 
plex, the LHC poses new challenges to our statistical 
methods - challenges we must meet with the same 
vigor that led to the theoretical and experimental 
advancements of the last decade. 

In the remainder of this Section, I introduce the 
physics goals of the LHC and most pertinent factors 
that complicate data analysis. I also review the for- 
mal link and the practical differences between confi- 
dence intervals and hypothesis testing. 

In Sec. 2, the primary ingredients to new particle 
searches are discussed. Practical and toy examples 
are presented in Sec. 3, which will be used to assess 
the most common methods in Sec. 4. The remainder 
of this paper is devoted to discussion on the most 
promising methods for the LHC. 



1.1 Physics Goals of the LHC 

Currently, our best experimentally justified model 
for fundamental particles and their interactions is 
the standard model. In short, the physics goals of 
the LHC come in two types: those that improve our 
understanding of the standard model, and those that 
go beyond it. 



The only particle of the standard model that has 
not been observed is the Higgs boson, which is key for 
the standard model's description of the electroweak 
interactions. The mass of the Higgs boson, mjj, is 
a free parameter in the standard model, but there 
exist direct experimental lower bounds and more in- 
direct upper bounds. Once ran is fixed, the standard 
model is a completely predictive theory. There are 
numerous particle-level Monte Carlo generators that 
can be interfaced with simulations of the detectors 
to predict the rate and distribution of all experimen- 
tal observables. Because of this predictive power, 
searches for the Higgs boson are highly tuned and of- 
ten employ multivariate discrimination methods like 
neural networks, boosted decision trees, support vec- 
tor machines, and genetic programming. 3, ,5 

While the Higgs boson is key for understand- 
ing the electroweak interactions, it introduces a new 
problem: i.e. the hierarchy "problem. There are 
several proposed solutions to the problem, one of 
which is to introduce a new fundamental symmetry, 
called supersymmetry (SUSY), between bosons and 
fermions. In practice, the minimal supersymmetric 
extension to the standard model (MSSM), with its 
105 parameters, is not so much a theory as a theo- 
retical framework. 

They key difference between SUSY and Higgs 
searches is that, in most cases, discovering SUSY 
will not be the difficult part. Searches for SUSY 
often rely on robust signatures that will show a de- 
viation from the standard model for most regions of 
the SUSY parameter space. It will be much more 
challenging to demonstrate that the deviation from 
the standard model is SUSY and to measure the fun- 
damental parameters of the theory. 6 In order to re- 
strict the scope of these proceedings, I shall focus 
LHC Higgs searches, where the issues of hypothesis 
testing are more relevant. 
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1.2 The Challenges of LHC Environment 

The challenges of the LHC environment are mani- 
fold. The first and most obvious challenge is due 
to the enormous rate of uninteresting background 
events from QCD processes. The total interaction 
rate for the LHC is of order 10 9 interactions per sec- 
ond; the rate of Higgs production is about ten orders 
of magnitude smaller. Thus, to understand the back- 
ground of a Higgs search, one must understand the 
extreme tails of the QCD processes. 

Compounding the difficulties due to the extreme 
rate is the complexity of the detectors. The full- 
fledged simulation of the detectors is extremely com- 
putationally intensive, with samples of 10 7 events 
taking about a month to produce with computing 
resources distributed around the globe. This compu- 
tational limitation constrains the problems that can 
been addressed with Monte Carlo techniques. 

Theoretical uncertainties also contribute to the 
challenge. The background to many searches re- 
quires calculations at, or just beyond, the state- 
of-the-art in particle physics. The most common 
situation requires a final state with several well- 
separated high transverse momentum objects {e.g. 
tijj — > blubjjjj), in which the regions of phys- 
ical interest are not reliably described by leading- 
order perturbative calculations (due to infra-red 
and collinear divergences), are too complex for the 
requisite next-to- next-to-leading order calculations, 
and are not properly described by the parton- 
shower models alone. Enormous effort has gone 
into improving the situation with ncxt-to-leading or- 
der calculations and matrix- element parton-showcr 
matching. 7,8 While these new tools are a vast im- 
provement, the residual uncertainties are still often 
dominant. 

Uncertainties from non-perturbative effects are 
also important. For some processes, the relevant 
regions of the parton distribution functions are not 
well-measured (and probably will not be in the first 
few years of LHC running) , which lead to uncertain- 
ties in rate as well as the shape of distributions. Fur- 
thermore, the various underlying-event and multiple- 
interaction models used to describe data from pre- 
vious colliders show large deviations when extrap- 
olated to the LHC. 9 This soft physics has a large 
impact on the performance of observables such as 
missing transverse energy. 



In order to augment the simulated data chain, 
most searches introduce auxiliary measurements to 
estimate their backgrounds from the data itself. In 
some cases, the background estimation is a simple 
sideband, but in others the link between the auxiliary 
measurement to the quantity of interest is based on 
simulation. This hybrid approach is of particular 
importance at the LHC. 

While many of the issues discussed above are not 
unique to the LHC, they are often more severe. At 
LEP, it was possible to generate Monte Carlo sam- 
ples of larger size than the collected data, QCD back- 
grounds were more tame, and most searches were not 
systematics-limitcd. The Tevatron has much more in 
common with the LHC; however, at this point dis- 
covery is less likely, and most of the emphasis is on 
measurements and limit setting. 

1.3 Confidence Intervals & Hypothesis Testing 

The last several conferences in the vein of PhyStat 
2005 have concentrated heavily on confidence inter- 
vals. In particular, 95% confidence intervals for some 
physics parameter in an experiment that typically 
has few events. More recently, there has been a large 
effort in understanding how to include systematic er- 
rors and nuisance parameters into these calculations. 

LHC searches, in contrast, are primarily inter- 
ested in 5cr discovery. The 5cr discovery criterion is 
somewhat vague, but usually interpreted in a fre- 
quentist sense as a hypothesis test with a rate of 
Type I error a = 2.85 • 1(T 7 . 

There is a formal link between confidence inter- 
vals and hypothesis testing: frequentist confidence 
intervals from the Neyman construction are formally 
inverted hypothesis tests. It is this equivalence that 
links the Neyman-Pearson lemma" to the ordering 
rule used in the unified method of Feldman and 
Cousins. 10 Furthermore, this equivalence will be very 
useful in translating our understanding of confidence 
intervals to the searches at the LHC. 

In some cases, this formal link can be mislead- 
ing. In particular, there is not always a continuous 
parameter that links the fully specified null hypoth- 
esis H to the fully specified alternate Hi in any 

a The lemma states that, for a simple hypothesis test of size 
a between a null Ho and an alternate Hi, the most powerful 
critical region in the observable x is given by a contour of the 
likelihood ratio L(x\Hq)/ L{x\H\). 
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Figure 1. Expected significance as a function of Higgs mass 
for the Atlas detector with 30 fb _1 of data. 

physically interesting or justified way. Furthermore, 
the performance of a method for a 95% confidence 
interval and a 5cr discovery can be quite different. 

2 The Ingredients of an LHC Search 

In order to assess the statistical methods that are 
available and develop new ones suited for the LHC, it 
is necessary to be familiar with the basic ingredients 
of the search. In this section, the basic ingredients, 
terminology, and nomenclature are established. 

2.1 Multiple Channels & Processes 

Almost all new particle searches do not observe the 
particle directly, but through the signatures left by 
the decay products of the particle. For instance, the 
Higgs boson will decay long before it interacts with 
the detector, but its decay products will be detected. 
In many cases, the particle can be produced and de- 
cay in many different configurations, each of which is 
called a search channel (see Tab. 1). There are may 
be multiple signal and background processes which 
contribute to each channel. For example, in H — > 77, 
the signal could come from any Higgs production 
mechanism and the background from either contin- 
uum 77 production or QCD backgrounds where jets 
fake photons. Each of these processes have their own 
rates, distributions for observables, and uncertain- 
ties. Furthermore, the uncertainties between pro- 
cesses may be correlated. 



In general the theoretical model for a new parti- 
cle has some free parameters. In the case of the stan- 
dard model Higgs, only the mass run is unknown. 
For SUSY scenarios, the Higgs model is parametrized 
by two parameters: mj and tan/?. Typically, the un- 
known variables are scanned and a hypothesis test is 
performed for each value of these parameters. The 
results from each of the search channels can be com- 
bined to enhance the power of the search, but one 
must take care of correlations among channels and 
ensure consistency. 

The fact that one scans over the parameters and 
performs many hypothesis tests increases the chance 
that one finds at least one large fluctuation from the 
null- hypothesis. Some approaches incorporate the 
number of trials explicitly, 11 some approaches only 
focus on the most interesting fluctuation, 12 and some 
see this heightened rate of Type I error as the moti- 
vation for the stringent 5er requirement. 13 

2.2 Discriminating Variables & Test Statistics 

Typically, new particles are known to decay with cer- 
tain characteristics that distinguish the signal events 
from those produced by background processes. Much 
of the work of a search is to identify those observ- 
ables and to construct new discriminating variables 
(generically denoted as m). Examples include an- 
gles between particles, invariant masses, and parti- 
cle identification criterion. Discriminating variables 
are used in two different ways: to define a signal-like 
region and to weight events. 

The usage of discriminating variables is related 
to the test statistic: the real-valued quantity used 
to summarize the experiment. The test statistic is 
thought of as being ordered such that either large or 
small values indicate growing disagreement with the 
null hypothesis. 

A simple "cut analysis" consists of defining a 
signal-like region bounded by upper- and lower- 
values of these discriminating variables and counting 
events in that region. In that case, the test statistic 
is simply the number of events observed in the signal 
like region. One expects b background events and s 
signal events, so the experimental sensitivity is op- 
timized by adjusting the cut values. More sophisti- 
cated techniques use multivariate algorithms, such as 
neural networks, to define more complicated signal- 
like regions, but the test statistic remains unchanged. 
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In these number counting analyses, the likelihood of 
observing n events is simply given by the Poisson 
model. 

There are extensions to this number-counting 
technique. In particular, if one knows the dis- 
tribution of the discriminating variable m for 
the background-only (null) hypothesis, /b(m), and 
the signal-plus-background (alternate) hypothesis, 
f a+ b(m) = [sf s (m) + bfb(m)]/(s + b), then there is 
a more powerful test statistic than simply counting 
events. This is intuitive, a well measured 'golden 
event' is often more convincing than a few messy 
ones. Following the Neyman-Pearson lemma, the 
most powerful test statistic is 

L(m|tfi) 



Q = 



L(m\H ) 



(1) 



nr cw Pois( ni \ Si + h) n;* 



(rij denotes events in i th channel) or equivalcntly 



q = InQ = -st„t + 



EE 

* 3 



V hfb{m l3 )J 



(2) 

The test statistic in Eq. 2 was used by the LEP 
Higgs Working Group (LHWG) in their final results 
on the search for the Standard Model Higgs. 14 

At this point, there are two loose ends: how does 
one determine the distribution of the discriminating 
variables f(m), and how does one go from Eq. 2 to 
the distribution of q for Ho and H\ . These are the 
topics of the next subsections. 

2.3 Parametric and N on- Parametric Methods 

In some cases, the distribution of a discriminat- 
ing variable f(m) can be parametrized and this 
parametrization can be justified either by physics ar- 
guments or by goodness-of-fit. However, there are 
many cases in which / (m) has a complicated shape 
not easily parametrized. For instance, Fig. 2 shows 
the distribution of a neural network output for signal 
events. In that case kernel estimation techniques can 
be used to estimate f(m) in a non-parametric way 
from a sample of events {rrii}. 15 The technique that 
was used by the LHWG 14 was based on an adaptive 
kernel estimation given by: 



n 1 , 

A(-) = E^*( 



m — mi 
h(mi) 



(3) 
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Figure 2. The distribution of a neural network output for sig- 
nal events. The histogram is shown together with /i(m). 



where 



ft (mi) = 



l/E 



-1/5 



(4) 



,V V /o(mi) 

a is the standard deviation of {xi}, K(x) is some ker- 
nel function (usually the normal distribution), and 
fo(x) is the fixed kernel estimate given by the same 
equation but with a fixed ft(mj) 

1/5 



h* = 



an 



-1/5 



(5) 



The solid line in Fig. 2 shows that the method 
(with modified-boundary kernels) works very well for 
shapes with complicated structure at many scales. 

2.4 Numerical Evaluation of Significance 

Given, f s (m) and /&(m) the distribution of q(x) can 
be constructed. For the background-only hypothe- 
sis, fb(m) provides the probability of corresponding 
values of q needed to define the single-event pdf pi. b 



(6) 



Pi,b(lo) = / fb(m) 8(q(m) - q )dm 



For multiple events, the distribution of the log- 
likelihood ratio must be obtained from repeated con- 
volutions of the single event distribution. This con- 
volution can cither be performed implicitly with ap- 
proximate Monte Carlo techniques, 16 or analytically 
with a Fourier transform technique. 17 In the Fourier 
domain, denoted with a bar, the distribution of the 
log-likelihood for n events is 



Pn = Pi 



(7) 



Thus the expected log-likelihood distribution for 
background with Poisson fluctuations in the number 



b The integral is necessary because the map q(m) : m — > q may 
be many-to-onc. 
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of events takes the form 

pb(q) = 22 —^-p n ,b(i) 

which in the Fourier domain is simply 



p b (q) =e blPiAq)-i], 



(8) 



(9) 



For the signal-plus-background hypothesis we expect 
s events from the distribution and b events from 
the p\fi distribution, which leads to the expression 
for p s+ b in the Fourier domain 



Ps+b(q) = e 



6[pi,i>(g)-i]+sK„(?)-i] 



(10) 



This equation generalizes, in a somewhat obvious 
way, to include many processes and channels. 

Numerically these computations are carried out 
with the Fast Fourier Transform (FFT). The FFT 
is performed on a finite and discrete array, beyond 
which the function is considered to be periodic. Thus 
the range of the p\ distributions must be sufficiently 
large to hold the resulting pb and p s+ b distributions. 
If they are not, the "spill over" beyond the maxi- 
mum log-likelihood ratio q ma x will "wrap around" 
leading to unphysical p distributions. Because the 
range of pb is much larger than p\.b it requires a 
very large number of samples to describe both distri- 
butions simultaneously. The implementation of this 
method requires some approximate asymptotic tech- 
niques that describe the scaling from p\ ib to pb- 18 

The nature of the FFT results in a number of 
round-off errors and limit the numerical precision 
to about 10~ 16 - which limit the method to signif- 
icance levels below about 8a. Extrapolation tech- 
niques and arbitrary precision calculations can over- 
come these difficulties, 18 but such small p-values are 
of little practical interest. 

From the log-likelihood distribution of the two 
hypotheses we can calculate a number of useful quan- 
tities. Given some experiment with an observed log- 
likelihood ratio, q* , we can calculate the background- 
only confidence level, CLb : 



CL b (q*) = / p b (q')dq' 

Jq* 



(11) 



c Perhaps it is worth noting that p(q) is a complex valued 
function of the Fourier conjugate variable of q. Thus nu- 
merically the exponentiation in Eq. 9 requires Euler's formula 



In the absence of an observation we can calculate the 
expected CLb given the signal-plus-background hy- 
pothesis is true. To do this we first must find the me- 
dian of the signal-plus-background distribution q s+b . 
From these we can calculate the expected CLb by 
using Eq. 11 evaluated at q* = q s+b - 

Finally, we can convert the expected background 
confidence level into an expected Gaussian signifi- 
cance, Za, by finding the value of Z which satisfies 



CL b (q s+b ) = 



1 - eii(Z/y/2) 



(12) 



where erf(Z) = (2/tt) exp(— y 2 )rfy is a function 
readily available in most numerical libraries. For Z > 
1.5, the relationship can be approximated 19 as 



Z w Vu- lnu with u = -2\n{CL b V^) (13) 

2.5 Systematic Errors, Nuisance Parameters & 
Auxiliary Measurements 

Sections 2.3 and 2.4 represent the state of the art for 
HEP in frcqucntist hypothesis testing in the absence 
of uncertainties on rates and shapes of distributions. 
In practice, the true rate of background is not known 
exactly, and the shapes of distributions are sensitive 
to experimental quantities, such as calibration coef- 
ficients and particle identification efficiencies (which 
are also not known exactly). What one would call a 
systematic error in HEP, usually corresponds to what 
a statistician would refer to as a nuisance parameter. 

Dealing with nuisance parameters in searches is 
not a new problem, but perhaps it has never been as 
essential as it is for the LHC. In these proceedings, 
Cousins reviews the different approaches to nuisance 
parameters in HEP and the professional statistical 
literature. 20 Also of interest is the classification of 
systematic errors provided by Sinervo. 21 In Sec. 4, 
the a few techniques for incorporating nuisance pa- 
rameters are reviewed. 

From an experimental point of view, the miss- 
ing ingredient is some set of auxiliary measurements 
that will constrain the value of the nuisance param- 
eters. The most common example would be a side- 
band measurement to fix the background rate, or 
some control sample used to assess particle identi- 
fication efficiency. Previously, I used the variable 
M to denote this auxiliary measurement 22 ; while 
Linncmann, 19 Cousins, 20 and Rolke, Lopez, and 
Conrad 23,24 used y. Additionally, one needs to know 
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Figure 3. The signal-like region and sideband for H — > 77 in 
which t is correlated to b via the model parameter a. 

the likelihood function that provides the connection 
between the nuisance parameter (s) and the auxiliary 
measurements. 

The most common choices for the likelihood of 
the auxiliary measurement are L(y\b) — Pois(y\Tb) 
and L(y\b) — G(y\Tb 7 a y ), where r is a constant that 
specifies the ratio of the number of events one expects 
in the sideband region to the number expected in the 
signal- like region. d 

A constant r is appropriate when one simply 
counts the number of events y in an "off-source" mea- 
surement. In a more typical case, one uses the distri- 
bution of some other variable, call it to, to estimate 
the number of background events inside a range of 
to (see Fig. 3). In special cases the ratio r is inde- 
pendent of the model parameters. However, in many 
cases (e.g. /(to) oc e~ am ), the ratio r depends on the 
model parameters. Moreover, sometimes the side- 
band is contaminated with signal events, thus the 
background and signal estimates can be correlated. 
These complications are not a problem as long as 
they are incorporated into the likelihood. 

The number of nuisance parameters and aux- 
iliary measurements can grow quite large. For in- 
stance, the standard practice at BaBar is to form 
very large likelihood functions that incorporate ev- 
erything from the parameters of the unitarity tri- 
angle to branching fractions and detector response. 
These likelihoods are typically factorized into multi- 
ple pieces, which are studied independently at first 

d Note that Linnemann 19 used a = 1/t instead, but in this 
paper a is reserved for the rate of Type I error. 



and later combined to assess correlations. The fac- 
torization of the likelihood and the number of nui- 
sance parameters included impact the difficulty of 
implementing the various scenarios considered below. 

3 Practical and Toy Examples 

In this Section, a few practical and toy examples are 
introduced. The toy examples are meant to provide 
simple scenarios where results for different methods 
can be easily obtained in order to expedite their com- 
parison. The practical examples are meant to ex- 
clude methods that provide nice solutions to the toy 
examples, but do not generalize to the realistic situ- 
ation. 

3.1 The Canonical Example 

Consider a number-counting experiment that mea- 
sures x events in the signal-like region and y events 
in some sideband. For a given background rate b in 
the signal-like region, say one can expect rb events 
in the sideband. Additionally, let the rate of signal 
events in the signal-like regions - the parameter of in- 
terest - be denoted \i. The corresponding likelihood 
function is 

L P (x, y\(i, b) = Pois(x\[i + b) ■ Pois(y\rb). (14) 
This is the same case that was considered in 

Refs. 20,22,23,24 for ^ y = Q^ty and Q = 

For LHC searches, we will be more interested in 
x, y = 0(100) and a = 2.85 • 10~ 7 . Furthermore, the 
auxiliary measurement will rarely be a pure number 
counting sideband measurement, but instead the re- 
sult of some fit. So let us also consider the likelihood 
function 

L G (x, y\n, b) = Pois(x\n + b) ■ G(y\rb, Vrb). (15) 

As a concrete example in the remaining sections, 
let us consider the case b = 100 and r = 1. Opera- 
tionally, one would measure y and then find the value 

Xcrit 

(y) necessary for discovery. In the language of 
confidence intervals, x cr it(y) is the value of x nec- 
essary for the 100(1 — a)% confidence interval in fi 
to exclude ^ = 0. In Sec. 4 we check the coverage 
(Type I error or false-discovery rate) for both L p and 
L G . 

Linnemann reviewed thirteen methods and 
eleven published examples of this scenario. 19 Of the 
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Figure 5. Two plausible shapes for the continuum 77 mass 
spectrum at the LHC. 



published examples, only three (the one from his ref- 
erence 18 and the two from 19) are near the range of 
x,y, and a relevant for LHC searches. Linnemann's 
review asks an equivalent question posed in this pa- 
per, but in a different way: what is the significance 
(Z in Eq. 12) of a given observation x, y. 

3.2 The LHC Standard Model Higgs Search 

The search for the standard model Higgs boson is 
by no means the only interesting search to be per- 
formed at the LHC, but it is one of the most studied 
and offers a particularly challenging set of channels 
to combine with a single method. Figure 1 shows 
the expected significance versus the Higgs mass, ma, 
for several channels individually and in combination 
for the Atlas experiment. 25 Two mass points are 
considered in more detail in Tab. 1, including re- 
sults from Refs. 1 ' 25 ' 26 . Some of these channels will 
most likely use a discriminating variable distribu- 
tion, f(m), to improve the sensitivity as described 
in Sec. 2.3. I have indicated the channels that I sus- 
pect will use this technique. Rough estimates on the 
uncertainty in the background rate have also been 
tabulated, without regard to the classification pro- 
posed by Sinervo. 

The background uncertainties for the ttH chan- 
nel have been studied in some detail and separated 
into various sources. 26 Figure 4 shows the rribb mass 
spectrum for this channel. 6 Clearly, the shape of 
the background-only distribution is quite similar to 

e It is not clear if this result is in agreement with the equivalent 
CMS result. 27 



the shape of the signal-plus-background distribution. 
Furthermore, theoretical uncertainties and 6-tagging 
uncertainties affect the shape of the background-only 
spectrum. In this case the incorporation of system- 
atic error on the background rate most likely pre- 
cludes the expected significance of this channel from 
ever reaching 5er. 

Similarly, the H — > 77 channel has uncertainty 
in the shape of the spectrum from background 
processes. One contribution to this uncertainty 
comes from the electromagnetic energy scale of the 
calorimeter (an experimental nuisance parameter), 
while another contribution comes from the theoreti- 
cal uncertainty in the continuum 77 production. Fig- 
ure 5 shows two plausible shapes for the m 77 spec- 
trum from "Born" and "Box" predictions. 

4 Review of Methods 

Based on the practical example of the standard 
model Higgs search at the LHC and the discussion 
in Sec. 2, the list of admissible methods is quite 
short. Of the thirteen methods reviewed by Linne- 
mann, only five are considered as reasonable or rec- 
ommended. These can be divided into three classes: 
hybrid Bayesian-frequentist methods, methods based 
on the Likelihood Principle, and frequentist methods 
based on the Neyman construction. 

4-1 Hybrid B ay esian- Frequentist Methods 

The class of methods frequently used in HEP and 
commonly referred to as the Cousins-Highland tech- 
nique (or secondarily Bayes in statistical literature) 
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Table 1. Number of signal and background events for representative Higgs search channels for two values of Higgs mass, mg, 
with 30 fb _1 of data. A rough uncertainty on the background rate is denoted as Sb/b, without reference to the type of systematic 
uncertainty. The table also indicates if the channels are expected to use a weight f(m) as in Eq. 2. 
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are based on a Bayesian average of frequentist p- 
values as found in the first equation of Ref. 28 . The 
Bayesian average is over the nuisance parameters and 
weighted by the posterior P(b\y). Thus the p- value 
of the observation (xo,yo) evaluated at /x is given by 

POO 

p(x ,y \fj,) = dbp{x \iJ,,b)P(b\yo) (16) 
Jo 

) 

dxP(x\^,y ) (17) 



/ 



where 



POO 

P{x\n,yo) = / dbP(x\fi,b) 
Jo 



P(yo\b) P(b) 



(18) 



P(yo) 

The form in Eq. 16, an average over p- values, is simi- 
lar to the form written in Cousins & Highland's arti- 
cle; and it is re-written in Eq. 17 to the form that is 
more familiar to those from LEP Higgs searches. 16 ' 17 
Actually, the dependence on y and the Bayesian 
prior P(b) shown explicitly in Eq. 18 is often not 
appreciated by those that use this method. 

The specific methods that Linnemann considers 
correspond to different choices of Bayesian priors. 
The most common in HEP is to ignore the prior and 
use a truncated Gaussian for the posterior P(b\y ), 
which Linnemann calls Zjv- For the case in which 
the likelihood L(y\b) is known to be Poisson, Linne- 
man prefers to use a flat prior, which gives rise to a 
Gamma-distributed posterior and Linncmann's sec- 
ond preferred method Zp, which is identical to the 
ratio of Poisson means Zb% and can be written in 
terms of (in)complete beta functions as 19 

Z r = Z Bl = B(l/(l + r),x,y+l)/B(x,y+l). (19) 

The method Linnemann calls Z& can be seen as an 
approximation of Zjv for large signals and is what 



Atlas used to assess its physics potential. 1 The 
method not recommended by Linnemann and was 
critically reviewed in Ref. 29 . 

xtit (V) = v/r + Zy/y/T(l + 1/t) (20) 
4-2 Likelihood Intervals 

As Cousins points out, the professional statistics 
literature seems less concerned with providing cor- 
rect coverage by construction, in favor of likelihood- 
based and Bayesian methods. The likelihood princi- 
ple states that given a measurement x all inference 
about (i should be based on the likelihood function 
L(x\fi). When nuisance parameters are included, 
things get considerably more complicated. 

The profile likelihood function is an attempt to 
eliminate the nuisance parameters from the likeli- 
hood function by replacing them with their condi- 
tional maximum likelihood estimates (denoted, for 
example, b ). The profile likelihood for Lp in Eq. 14 
is given by L{x,y\no,b{no)), with 
x + y - (1 + t)hq 



Km) = 



+ 



2(1 + t) 

y/(x + y-(l+ T)fi ) 2 + 4(1 + T) yf i 



(21) 



2(1 +r) 

The relevant likelihood ratio is then 



X P (fj,\x,y) 



L(x,y\fj, ,b(n )) 
L(x,y\fi,b) 



(22) 



where (1 and b are the unconditional maximum like- 
lihood estimates. 

One of the standard results from statistics is that 
the distribution of —2 In A converges to the x 2 dis- 
tribution with k degrees of freedom, where k is the 
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^30 




Figure 6. The profile likelihood ratio —2 In A versus the signal 
strength /i for y = 100, r = 1, and x = x cr i t (y) = 185. 

number of parameters of interest. In our example 
k = 1, so a 5cr confidence interval is defined by the 
set of (i with — 2 In X(fJ,\x, y) < 25. Figure 6 shows 
the graph of — 2lnX(fx\x, y) for y = 100 at the criti- 
cal value of x for a 5a discovery. 

At PhyStat2003, Nancy Reid presented var- 
ious adjustments and improvements to the pro- 
file likelihood which speed asymptotic convergence 
properties. 30 Cousins considers these methods in 
more detail from a physicist perspective. 20 

Only recently was it generally appreciated that 
the method of Minuit 31 commonly used in HEP cor- 
responds to the profile likelihood intervals. The cov- 
erage of these methods is not guaranteed, but has 
been studied in simple cases. 23,24 These likelihood- 
based techniques are quite promising for searches at 
the LHC, but their coverage properties must be as- 
sessed in the more complicated context of the LHC 
with weighted events and several channels. In par- 
ticular, the distribution of q in Eq. 10 is often highly 
non-Gaussian. 

4-3 The Neyman Construction with Systematics 

Linnemann's preferred method, Zsi, is related to 
the familiar result on the ratio of Poisson means. 32 
Unfortunately, the form of is tightly coupled 
to the form of Eq. 14, and can not be directly ap- 
plied to the more complicated cases described above. 
However, the standard result on the ratio of Pois- 



son means 32 and Cousins' improvement 33 are actu- 
ally special cases of the Neyman construction with 
nuisance parameters (with and without conditioning, 
respectively). 

Of course, the Neyman construction does gener- 
alize to the more complicated cases discussed above. 
Two particular types of constructions have been pre- 
sented, both of which are related to the profile like- 
lihood ratio discussed in Kendall's chapter on likeli- 
hood ratio tests & test efficiency. 34 This relationship 
often leads to confusion with the profile likelihood 
intervals discussed in Sec. 4.2. 

The first method is a full Neyman construction 
over both the parameters of interest and the nui- 
sance parameters, using the profile likelihood ratio 
as an ordering rule. Using this method, the nuisance 
parameter is "projected out" , leaving only an inter- 
val in the parameters of interest. I presented this 
method at PhyStat2003 in the context of hypothesis 
testing/ and similar work was presented by Punzi 
at this conference. 22,35 This method provides cover- 
age by construction, independent of the ordering rule 
used. 

The motivation for using the profile likelihood 
ratio as a test statistic is twofold. First, it is inspired 
by the Neyman-Pearson lemma in the same way as 
the Feldman-Cousins ordering rule. Secondly, it is 
independent of the nuisance parameters; providing 
some hope of obtaining similar tests. 9 Both Punzi 
and myself found a need to perform some "clipping" 
to the acceptance regions to protect from irrelevant 
values of the nuisance parameters spoiling the pro- 
jection. For this technique to be broadly applica- 
ble, some generalization of this clipping procedure is 
needed and the scalability with the number of pa- 
rameters must be addressed.' 1 

The second method, presented by Feldman at 
the Fermilab conference in 2000, involves a Ney- 
man construction over the parameters of interest, but 
the nuisance parameters are fixed to the conditional 
maximum likelihood estimate: a method I will call 
the profile construction. The profile construction is 
an approximation of the full construction, that does 

•fin simple hypothesis testing fi is not a continuous parameter, 
but only takes on the values fiQ = or fi± = s. 
9 Similar tests are those in which the critical regions of size a 
are independent of the nuisance parameters. Similar tests do 
not exist in general. 

h A Monte Carlo sampling of the nuisance parameter space 
could be used to curb the curse of dimensionality. 22 
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not necessarily cover. To the extent that the use of 
the profile likelihood ratio as a test statistic provides 
similar tests, the profile construction has good cover- 
age properties. The main motivation for the profile 
construction is that it scales well with the number of 
nuisance parameters and that the "clipping" is built 
in (only one value of the nuisance parameters is con- 
sidered). 

It appears that the CHOOZ experiment actually 
performed both the full construction (called "FC cor- 
rect syst.") and the profile construction (called "FC 
profile") in order to compare with the strong confi- 
dence technique. 36 

Another perceived problem with the full con- 
struction is that bad over-coverage can result from 
the projection onto the parameters of interest. It 
should be made very clear that the coverage proba- 
bility is a function of both the parameters of interest 
and the nuisance parameters. If the data are con- 
sistent with the null hypothesis for any value of the 
nuisance parameters, then one should probably not 
reject it. This argument is stronger for nuisance pa- 
rameters directly related to the background hypoth- 
esis, and less strong for those that account for instru- 
mentation effects. In fact, there is a family of meth- 
ods that lie between the full construction and the 
profile construction. Perhaps we should pursue a hy- 
brid approach in which the construction is formed for 
those parameters directly linked to the background 
hypothesis, the additional nuisance parameters take 
on their profile values, and the final interval is pro- 
jected onto the parameters of interest. 

5 Results with the Canonical Example 

Consider the case b trU e — 100, r = 1 (i.e. 10% sys- 
tematic uncertainty). For each of the methods we 
find the critical boundary, x cr n(y), which is neces- 
sary to reject the null hypothesis fiQ — at 5a when 
y is measured in the auxiliary measurement. Figure 7 
shows the contours of Lq, from Eq. 15, and the criti- 
cal boundary for several methods. The far left curve 
shows the simple sj\fb curve neglecting systematics. 
The far right curve shows a critical region with the 
correct coverage. With the exception of the profile 
likelihood, Ap, all of the other methods lie between 
these two curves (it. all of them under-cover). The 
rate of Type I error for these methods was evaluated 
for Lq and Lp and presented in Table 2. 



contours for b true =100, critical regions for x = 1 




x 



Figure 7. A comparison of the various methods critical bound- 
ary x cr i t {y) (see text). The concentric ovals represent con- 
tours of Lq from Eq. 15. 



The result of the full Neyman construction and 
the profile construction are not presented. The full 
Neyman construction covers by construction, and 
it was previously demonstrated for a similar case 
(b = 100, t = 4) that the profile construction gives 
similar results. 22 Furthermore, if the Ap were used as 
an ordering rule in the full construction, the critical 
region for b = 100 would be identical to the curve 
labeled "Ap profile" (since Ap actually covers). 

It should be noted that if one knows the likeli- 
hood is given by Lq(x, y\fi, b), then one should use 
the corresponding profile likelihood ratio, Xg(x, y\fJ-), 
for the hypothesis test. However, knowledge of the 
correct likelihood is not always available (Sinervo's 
Class II systematic), so it is informative to check 
the coverage of tests based on both \o(x,y\[i) and 
Xp(x,y\[i) by generating Monte Carlo according to 
Lq(x, y\jji, b) and Lp(x,y\fi,b). In a similar way, this 
decoupling of true likelihood and the assumed likeli- 
hood (used to find the critical region) can break the 
"guaranteed" coverage of the Neyman construction. 

It is quite significant that the Zjy method undcr- 
covers, since it is so commonly used in HEP. The de- 
gree to which the method under-covers depends on 
the truncation of the Gaussian posterior P(b\y). Lin- 
nemann's table also shows significant under-coverage 
(over estimate of the significance Z). In order to ob- 
tain a critical region with the correct coverage, the 
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author modified the region x crit (y) = x^" it (y) + C 
and found C = 16 provided the correct coverage. A 
discrepancy of 16 events is not trivial! 

Table 2. Rate of Type I error interpreted as equivalent Za for 
various methods designed for a 5cr test. Monte Carlo events 
are generated via either Lq or Lp. The critical x for y = 100 
is also listed for easy comparison. 



Method 


L G (Za) 


L P (Za) 


Xcrit(y = 100) 


No Syst 


3.0 


3.1 


150 


Z 5 , 


4.1 


4.1 


171 


Z N (Sec. 4.1) 


4.2 


4.2 


178 


ad hoc 


4.6 


4.7 


188 


z r = z Bi 


4.9 


5.0 


185 


profile Ap 


5.0 


5.0 


185 


profile Xq 


4.7 


4.7 


~182 



Notice that for large x, y the Bayesian- 
frequcntist hybrid Zn approaches Zy , where the the 
critical region is of the form x cr i t (y) = y/T + ny/y/r. 
Because the boundary is very nearly linear around 
yo, one can find the value of n that gives the proper 
coverage with a little geometry. In particular, the 
number n needed to get a Za test gives 

x cn t{y) = v/t + ZVl+l/rm^ (23) 
where 

f l + -L=) (24) 



The m 2 factor can be seen as a correction to the Z& 
and Zn results. Notice that the correction is larger 
for higher significance tests. As an ad hoc method, I 
experimented with the Zn method replacing r with 
tto 2 in the posterior P(b\y). The coverage of this ad 
hoc method is better than Zn, but not exact because 
x, y are not sufficiently large. 

6 Conclusions 

I have presented the statistical challenges of searches 
at the LHC and the current state of the statistical 
methods commonly used in HEP. I have attempted 
to accurately portray the complexity of the searches, 
explain their key ingredients, and provide a practical 
example for future studies. Three classes of methods, 
which are able to incorporate all the ingredients, have 
been identified: hybrid Bayesian-frequentist meth- 
ods, methods based on the Likelihood Principle, and 
frcqucntist methods based on the Neyman construc- 
tion. 



The Bayesian-frequentist hybrid method, Zn, 
shows significant under-coverage in the toy example 
considered when pushed to the 5a regime. While 
Bayesian might not care about coverage, significant 
under-coverage is undesirable in HEP. Further study 
is needed to determine if a more careful choice of 
prior distributions can remedy this situation - es- 
pecially in more complex situations. The improved 
coverage of Zp may give some guidance. 

The methods based on the likelihood principle 
have gained a great deal of attention from HEP in 
recent years. While the methods appear to do well in 
the toy example, it requires further study to deter- 
mine their properties in the more realistic situation 
with weighted events. 

Slowly, the HEP community is coming to grips 
with how to incorporate nuisance parameters into the 
Neyman construction. Several ideas for reducing the 
over-coverage induced by projecting out the nuisance 
parameters and reducing the computational burden 
have been presented. A hybrid approach between the 
full construction and the profile construction should 
be investigated in more detail. 

Finally, it seems that the HEP community is 
approaching a point where we appreciate the fun- 
damental statistical issues, the limitations of some 
methods, and the benefits of others. Clearly, the 
philosophical debate has not ended, but there seems 
to be more emphasis on practical solutions to our 
very challenging problems. 
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